{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "ML Debugging is Hard.ipynb",
      "version": "0.3.2",
      "provenance": [
        {
          "file_id": "1K6dyxkBfrXVxppRzGHUnMFnK2Otnlp5i",
          "timestamp": 1541016529771
        }
      ],
      "collapsed_sections": [
        "JndnmDMp66FL"
      ],
      "last_runtime": {
        "build_target": "//learning/brain/python/client:colab_notebook",
        "kind": "private"
      }
    },
    "kernelspec": {
      "display_name": "Python 2",
      "name": "python2"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "JndnmDMp66FL"
      },
      "source": [
        "#### Copyright 2018 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "hMqWDc_m6rUC",
        "colab": {}
      },
      "source": [
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "91QkRICDEYqT"
      },
      "source": [
        "# Counterintuitive Challenges in ML Debugging"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "9vC0sXBEEgUF"
      },
      "source": [
        "In this Colab, you will explore why ML debugging is harder than traditional debugging by debugging a simple regression problem, with one feature and one label. You will:\n",
        "\n",
        "* Create the dataset.\n",
        "* Try to fit the data with a simple model.\n",
        "* Debug the model.\n",
        "* Demonstrate exploding gradients.\n",
        "\n",
        "Please **make a copy** of this Colab before running it. Click on *File*, and then click on *Save a copy in Drive*."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "5IPfZryiJJXv"
      },
      "source": [
        "# Case Study: Debugging a Simple Model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "O-vpfBAN48gW"
      },
      "source": [
        "## Create the Dataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "_OeRWVAGvF63"
      },
      "source": [
        "Run the cells below to load libraries."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "SYj-8T48e6Rw",
        "colab": {}
      },
      "source": [
        "# Reset environment for a new run\n",
        "% reset -f\n",
        "\n",
        "# Load Libraries\n",
        "from os.path import join # for joining file pathnames\n",
        "import pandas as pd\n",
        "import tensorflow as tf\n",
        "from tensorflow import keras\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "# Set Pandas display options\n",
        "pd.options.display.max_rows = 10\n",
        "pd.options.display.float_format = '{:.1f}'.format"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Ox6Jlt_rj8s0"
      },
      "source": [
        "Create the data. Your data consists of one feature with values 0 to 9, and your labels are the same data with some noise added. In a dataset, by convention, rows are examples and columns are features. To match this convention, transpose your data. Before transposing your vectors, you must convert them to matrices."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "4aN24LlKj7LM",
        "colab": {}
      },
      "source": [
        "features = np.array(range(10))\n",
        "features = features[:, np.newaxis]\n",
        "# Create labels by adding noise distributed around 0\n",
        "labels = features + np.random.random(size=[10,1]) - 0.5"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "bMXvcL7fpkG2"
      },
      "source": [
        "Verify that the data roughly lies in a straight line and, therefore, is easily predicted..\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "P6fomFA9pnrF",
        "colab": {}
      },
      "source": [
        "# Visualize the data\n",
        "plt.scatter(features,labels)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "MAVm0L6GCwrs"
      },
      "source": [
        "## Fit Simple Data with Simple Model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "vgn_hESwkhmd"
      },
      "source": [
        "TensorFlow provides several different APIs. This Colab only demonstrates the Keras API since Keras lets you quickly train models in a few lines of code using high-level APIs. In Keras, the typical neural network is a `sequential` model with fully-connected, or `dense`, layers.\n",
        "\n",
        "Your dataset is simple. A neural network with just 1 neuron should learn your dataset. Define a neural network with 1 layer having 1 neuron using the model type `keras.Sequential` with the layer type `keras.layers.Dense`. To understand the Keras code, read the code comments. Then run the cell. The code prints the model summary to show a model with 1 layer and 2 parameters (weight and bias)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "E2SHLw83z4fF",
        "colab": {}
      },
      "source": [
        "# Delete any existing assignment to \"model\"\n",
        "model = None\n",
        "\n",
        "# Use a sequential model\n",
        "model = keras.Sequential()\n",
        "\n",
        "# Add a layer with 1 neuron. Use the popular \"tanh\" activation function\n",
        "model.add(keras.layers.Dense(units=1,             # 1 neuron\n",
        "                             activation='tanh',   # 'tanh'\n",
        "                             input_dim=1))         # number of feature cols=1\n",
        "\n",
        "# Model calculates loss using mean-square error (MSE)\n",
        "# Model trains using Adam optimizer with learning rate = 0.001\n",
        "model.compile(optimizer=tf.train.AdamOptimizer(0.001),\n",
        "              loss='mse',\n",
        "             )\n",
        "\n",
        "print(model.summary())"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ERliMzkpvm8n"
      },
      "source": [
        "Now, train the model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "ldasx-XNvr53",
        "colab": {}
      },
      "source": [
        "model.fit(x=features,\n",
        "          y=labels,\n",
        "          epochs=10,    # train for 10 epochs\n",
        "          batch_size=10,# use 10 examples per batch\n",
        "          verbose=1)    # verbose=1 prints progress per epoch"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CMcdO6GPnBCK"
      },
      "source": [
        "Your loss stubbornly refuses to decrease! Review your approach keeping in mind the guidance on the [model development process](https://developers.google.com/machine-learning/testing-debugging/common/overview). \n",
        "\n",
        "The following list describes possible actions to debug your model. Read the actions and their explanations to understand how debugging in ML requires you to sort through multiple possibilities at once. If an action sounds promising, experiment by modifying the code above.\n",
        "\n",
        "* **Transforming data**: You data is not transformed. You can experiment by transforming the data appropriately and retraining the model.\n",
        "* **Activation function**: The `tanh` activation function cannot predict values >1. Besides, in a regression problem, the last layer should always use the linear activation function. Therefore, should you use  `activation='linear'`?\n",
        "* **Hyperparameter values**: Should you adjust any hyperparameter values to try reducing loss?\n",
        "* **Simpler model**: The model development process recommends starting with a simple model. A linear model is simpler than your nonlinear model. Should you use `activation='linear'`?\n",
        "* **Change optimizer**: Your model uses the Adam optimizer. You can fall back to the gradient descent optimizer by using `optimizer=keras.optimizers.SGD()`.\n",
        "\n",
        "Consider these actions and experiment where necessary. Then read the following section for the solution."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Ej14ORiDUIM3"
      },
      "source": [
        "## Solution: Getting Loss to Decrease"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "LfGdTkrdUJcm"
      },
      "source": [
        "Before trying to adjust specific model parameters, such as the hyperparameter values, you should first check for good development practices. Here, you should start with a linear model because of these two best practices:\n",
        "\n",
        "* Regression: In a regression problem, the last layer must always be linear.\n",
        "* Start simple: Since a linear model is simpler than a nonliner model, start with a linear model.\n",
        "\n",
        "Run the following code to train a linear model and check if your loss decreases. The code displays the loss curve."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "8fKzl07oWjcD",
        "colab": {}
      },
      "source": [
        "model = None\n",
        "model = keras.Sequential()\n",
        "model.add(keras.layers.Dense(1, activation='linear', input_dim=1))\n",
        "model.compile(optimizer=tf.train.AdamOptimizer(0.001), loss='mse')\n",
        "trainHistory = model.fit(features, labels, epochs=10, batch_size=1, verbose=1)\n",
        "# Plot loss curve\n",
        "plt.plot(trainHistory.history['loss'])\n",
        "plt.title('Loss Curves')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CkcKovAAk4r_"
      },
      "source": [
        "Your loss decreases, albeit slowly! You're on the right track. How can you get your loss to converge? Experiment with the code above. For the solution, read the following section."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "eXckvj-FlEzl"
      },
      "source": [
        "## Solution: Reaching Convergence"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "DgBZXOuClIeX"
      },
      "source": [
        "Your loss isn't decreasing fast enough. From the guidance on [Learning Rate](https://developers.google.com/machine-learning/crash-course/reducing-loss/learning-rate), you know that you can increase the learning rate to train faster. Run the following code to increase the learning rate to 0.1. The the model reaches convergence quickly."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "RRkbxeLVlZoy",
        "colab": {}
      },
      "source": [
        "model = None\n",
        "model = keras.Sequential()\n",
        "model.add(keras.layers.Dense(1, activation='linear', input_dim=1))\n",
        "model.compile(optimizer=tf.train.AdamOptimizer(0.1), loss='mse')\n",
        "model.fit(features, labels, epochs=5, batch_size=1, verbose=1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "u7Mk-ivUgf9o"
      },
      "source": [
        "Wonderful, you quickly get a very low loss! Let's confirm the model works by predicting results for values [0,9] and superimposing them on top of the features."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "WRgXYbBst0Pd",
        "colab": {}
      },
      "source": [
        "# get predictions\n",
        "featuresPred = model.predict(features, verbose=1)\n",
        "# Plot original features and predicted values\n",
        "featuresPred = np.transpose(featuresPred)\n",
        "plt.scatter(range(10), labels, c=\"blue\")\n",
        "plt.scatter(range(10), featuresPred, c=\"red\")\n",
        "plt.legend([\"Original\", \"Predicted\"])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8s8gb4rTTei7"
      },
      "source": [
        "Yes, the predictions match the features very well."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "38cbAf2RpHtI"
      },
      "source": [
        "## Summary of Case Study"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "AjNDlFMMMsTc"
      },
      "source": [
        "When debugging ML models, you should first attempt to diagnose the problem and apply the appropriate fix. For example, if you had changed your optimizer using `optimizer='sgd'`, then your model also converges faster. However, the problem was not with the optimizer but with the learning rate. Changing the optimizer only helps because `optimizer='sgd'` has a higher default learning rate than `optimizer='adam'`.\n",
        "\n",
        "Alternatively, you could train the model for longer with the default learning rate. However, in real-world ML, models take long to train. You should keep your training cycles as short as possible. Therefore, increasing the learning rate is the correct fix.\n",
        "\n",
        "These options demonstrate how debugging in ML is n-dimensional, and therefore you must use your understanding of model mechanics to narrow down your options. Because running experiments in ML is time consuming, requires careful setup, and can be subject to reproducibility issues, it's important to use your understanding of model mechanics to  narrow down options without having to experiment.\n",
        "\n",
        "Lastly, according to development best practices, you should transform your feature data appropriately. This Colab did not transform the feature data because transformation is not required for convergence. However, you should always transform data appropriately. Here, you could normalize your feature data using z-score or scale the feature data to [0,1]. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "HEGEERnbglN9"
      },
      "source": [
        "# Exploding Gradients"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "s92MiwHIgm58"
      },
      "source": [
        "A common problem in model training is a loss that \"explodes\" or becomes `nan`. A common cause is anomalous feature data, such as outliers and `nan` values, or a high learning rate. The following sections demonstrate these causes."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "wOg62A4KiLk3"
      },
      "source": [
        "## Cause: High Learning Rate\n",
        "\n",
        "In this section, you will create data in the range [0,50] and show that the gradient explodes when you train the model using a learning rate of 0.01. Then you'll reduce the learning rate to make the model converge.\n",
        "\n",
        "Create and visualize the data by running the following code."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "826oEnhXOi2O",
        "colab": {}
      },
      "source": [
        "# create data with large values\n",
        "features = np.array(range(50))\n",
        "# generate labels\n",
        "labels = features + np.random.random(features.shape) - 0.5\n",
        "\n",
        "# Transpose data for input\n",
        "[features, labels] = [features.transpose(), labels.transpose()]\n",
        "\n",
        "plt.scatter(range(len(features)), features)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Ckm1y8fCZ1s0"
      },
      "source": [
        "Run the following cell to train a model with a learning rate of 0.01. You will get `inf` for your loss."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "xE8LTD1CZy98",
        "colab": {}
      },
      "source": [
        "# Train on raw data\n",
        "model = None\n",
        "model = keras.Sequential()\n",
        "model.add(keras.layers.Dense(1, input_dim=1, activation='linear'))\n",
        "model.compile(optimizer=keras.optimizers.SGD(0.01), loss='mse')\n",
        "model.fit(features, labels, epochs=5, batch_size=10, verbose=1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "A9QgTarmdWxu"
      },
      "source": [
        "To demonstrate that the high learning rate makes the loss explore, reduce the learning rate to `0.001`. Your loss will converge."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "RWRyRZaXt_l9"
      },
      "source": [
        "# Conclusion"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "4NmLFsc8Gz67"
      },
      "source": [
        "This Colab demonstrated the following principles.\n",
        "\n",
        "* The n-dimensional nature of debugging in ML makes ML debugging hard.\n",
        "* For effective debugging, understanding model mechanics is important.\n",
        "* Start with a simple model.\n",
        "* Exploding gradients incorrect normalization in the model, mis-configuration of FeatureColumns, etc., than raw data containing NaNs."
      ]
    }
  ]
}
